Portable High Performance and Scalability of Partitioned Global Address Space Languages

نویسندگان

  • Cristian Coarfa
  • Peter Joseph Varman
چکیده

Large scale parallel simulations are fundamental tools for engineers and scientists. Consequently, it is critical to develop both programming models and tools that enhance development time productivity, enable harnessing of massively-parallel systems, and to guide the diagnosis of poorly scaling programs. This thesis addresses this challenge in two ways. First, we show that Co-array Fortran (CAF), a shared-memory parallel programming model, can be used to write scientific codes that exhibit high performance on modern parallel systems. Second, we describe a novel technique for analyzing parallel program performance and identifying scalability bottlenecks, and apply it across multiple programming models. Although the message passing parallel programming model provides both portability and high performance, it is cumbersome to program. CAF eases this burden by providing a partitioned global address space, but has before now only been implemented on sharedmemory machines. To significantly broaden CAF’s appeal, we show that CAF programs can deliver high-performance on commodity cluster platforms. We designed and implemented cafc, the first multiplatform CAF compiler, which transforms CAF programs into Fortran 90 plus communication primitives. Our studies show that CAF applications matched or exceeded the performance of the corresponding message passing programs. For good node performance, cafc employs an automatic transformation called procedure splitting; for high performance on clusters, we vectorize and aggregate communication at the source level. We extend CAF with hints enabling overlap of communication with computation. Overall, our experiments show that CAF versions of NAS benchmarks match the performance of their MPI counterparts on multiple platforms. The increasing scale of parallel systems makes it critical to pinpoint and fix scalability bottlenecks in parallel programs. To automatize this process, we present a novel analysis technique that uses parallel scaling expectations to compute scalability scores for calling contexts, and then guides an analyst to hot spots using an interactive viewer. Our technique is general and may thus be applied to several programming models; in particular, we used it to analyze CAF and MPI codes, among others. Applying our analysis to CAF programs highlighted the need for language-level collective operations which we both propose and evaluate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GSHMEM: A Portable Library for Lightweight, Shared-Memory, Parallel Programming

As parallel computer systems evolve to address the insatiable need for higher performance in applications from a broad range of science domains, and exhibit ever deeper and broader levels of parallelism, the challenge of programming productivity comes to the forefront. Whereas these systems (and, in some cases, devices) are often constructed as distributed-memory architectures to facilitate eas...

متن کامل

Model Checking with User-Definable Memory Consistency Models

From the viewpoint of performance and scalability, relaxed memory consistency models are common and essential for parallel/distributed programming languages in which multiple processes are able to share a single global address space, such as Partitioned Global Address Space languages. However, a problem with relaxed memory consistency models is that programming is difficult and error-prone beca...

متن کامل

OSPRI: An Optimized One-Sided Communication Runtime for Leadership-Class Machines

Partitioned Global Address Space (PGAS) programming models provide a convenient approach to implementing complex scientific applications by providing access to a large, globally accessible address space. This paper describes the design, implementation and performance of a new one-sided communication library that attempts to meet the needs of PGAS models, particularly Global Arrays, but hopefull...

متن کامل

A scalable replay-based infrastructure for the performance analysis of one-sided communication

Partitioned global address space (PGAS) languages combine the convenient abstraction of shared memory with the notion of affinity, extending multi-threaded programming to large-scale systems with physically distributed memory. However, in spite of their obvious advantages, PGAS languages still lack appropriate tool support for performance analysis, one of the reasons why their adoption is still...

متن کامل

Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT

Partitioned Global Address Space (PGAS) Languages are an emerging alternative to MPI for HPC applications development. The GASNet library from Lawrence Berkeley National Lab and the University of California at Berkeley provides the network runtime for multiple implementations of four PGAS Languages: Unified Parallel C (UPC), Co-Array Fortran (CAF), Titanium and Chapel. GASNet provides a low ove...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007